Index Support for Mining Data Streams in a Relational DBMS

نویسندگان

  • Elena Baralis
  • Tania Cerquitelli
  • Silvia Chiusano
  • Diego Mostile
چکیده

This paper presents a novel index, called I-Forest, to support data mining activities on data streams, i.e., sequences of incoming data blocks. This approach is appropriate for itemset extraction on evolving datasets such as analysis of transactional data streams from retail chains. The index is a covering structure that represents transaction blocks in a succinct form and allows different kinds of analysis (e.g., analyze quarterly data). During the creation phase no support constraint is enforced, thus the index provides a complete representation of the data stream. The I-Forest index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Preliminary experiments have been run to validate the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Efficient Item Set Mining Supported by IMine Index

This paper presents the IMine index, a general and compact structure which provides tight integration of item set extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. ...

متن کامل

The Drill Down Benchmark

Data Mining places specific requirements on DBMS query performance that cannot be evaluated satisfactorily using existing OLAP benchmarks. The DD Benchmark defined here provides a practical case and yardstick to explore how well a DBMS is able to support Data Mining applications. It was derived from real-life data mining tasks performed by our Data SurveyorTM tool running on a variety of DBMS b...

متن کامل

Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS

The conflict between resource consumption and query performance in the data mining context often has no satisfactory solution. This not only stands in sharp contrast to the need of the analysts for interactive response times, but also makes the seamless integration of data mining operators into common multiuser database systems a difficult and (so far) not very prosperous task. We believe that ...

متن کامل

On Reconnguring Query Execution Plans in Distributed Object-relational Dbms

Massive database sizes and growing demands for decision support and data mining result in long-running queries in extensible Object-Relational DBMS, particularly in decision support and data warehousing analysis applications. Parallelization of query evaluation is often required for acceptable performance. Yet queries are frequently processed suboptimally due to (1) only coarse or inaccurate es...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005